Semantic Data Set Construction from Human Clustering and Spatial Arrangement

نویسندگان

چکیده

Abstract Research into representation learning models of lexical semantics usually utilizes some form intrinsic evaluation to ensure that the learned representations reflect human semantic judgments. Lexical similarity estimation is a widely used method, but efforts have typically focused on pairwise judgments words in isolation, or are limited specific contexts and stimuli. There limitations with these approaches either do not provide any context for judgments, thereby ignore ambiguity, very sentential cannot then be generate larger resource. Furthermore, between more than two items considered. We full description analysis our recently proposed methodology large-scale data set construction produces classification large sample verbs first phase, as well multi-way made within resultant classes second phase. The uses spatial multi-arrangement approach field cognitive neuroscience capturing visual adapted this method handle polysemous linguistic stimuli much samples previous work. specifically target verbs, can equally applied other parts speech. perform cluster from phase demonstrate how might useful comprehensive verb also analyze information captured by discuss potential spatially induced better notions word similarity. fine-grained analyses tasks clustering In particular, we find stronger static embedding methods still outperform emerging recent pre-training methods, both word-level clustering. Moreover, thanks set’s vast coverage, able compare benefits specializing vector particular type external knowledge evaluating FrameNet- VerbNet-retrofitted domains such “Heat” “Motion.”

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering to Reduce Spatial Data Set Size

Traditionally it had been a problem that researchers did not have access to enough spatial data to answer pressing research questions or build compelling visualizations. Today, however, the problem is often that we have too much data. Spatially redundant or approximately redundant points may refer to a single feature (plus noise) rather than many distinct spatial features. We can use density-ba...

متن کامل

the clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance

با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...

Data Set Pagesize Clustering Merging

if the structure to be mapped is a DAG. We presented heuristics to handle the above cases. Finally we presented a performance study that bears out our analytical results, and shows that the optimal clustering and Smart-BFS techniques perform much better than pre-order clustering on path length measures, and only a little worse on pre-order traversal. Our techniques are likely to be of importanc...

متن کامل

Semantic Lexicon Construction: Learning from Unlabeled Data via Spectral Analysis

This paper considers the task of automatically collecting words with their entity class labels, starting from a small number of labeled examples (‘seed’ words). We show that spectral analysis is useful for compensating for the paucity of labeled examples by learning from unlabeled data. The proposed method significantly outperforms a number of methods that employ techniques such as EM and co-tr...

متن کامل

Mining Spatial Data via Clustering

Contributions from researchers in Knowledge Discovery are producing essential tools in order to better understand the typically large amounts of spatial data in Geographical Information Systems. Clustering techniques are proving to be valuable in providing exploratory analysis functionality while supporting approaches for automated pattern discovery in spatially referenced data and for the iden...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computational Linguistics

سال: 2021

ISSN: ['1530-9312', '0891-2017']

DOI: https://doi.org/10.1162/coli_a_00396